1 Introduction

1.1 RUV-III-NB model

To model the raw count for gene \(g\) and cell \(c\), \(y_{gc}\), RUV-III-NB uses a Negative Binomial (NB), \(y_{gc} \sim NB(\mu_{gc},\psi_g)\) or a Zero-inflated Negative Binomial (ZINB) random variable. For simplicity, in what follows we describe the NB representation.

Let \(\boldsymbol y_g=(y_{g1},y_{g2},\ldots,y_{gN})^T\) be the vector of count for gene \(g\) and \(\mu_g\) be the vector of mean parameters. We further assume that there are \(m \mbox{<} N\) groups of cells with the same underlying biology among the \(N\) cells, which we refer to as pseudo-replicates. Using a generalized linear model with log link function to model the mean parameters as a function of the unknown unwanted factors \(\mathbf W\) and the underlying biology represented by the replicate matrix \(\mathbf M\), we have

\[\begin{equation} \log \mu_g = \boldsymbol \zeta_g + \mathbf M\beta_g + \boldsymbol W\alpha_g, \label{eq:NBGLM} \end{equation}\]

where \(\boldsymbol M (N \times m)\) is a matrix of pseudo-replicates membership with \(\boldsymbol M(i,j)=1\) if the ith cell is part of the jth pseudo-replicate and 0 otherwise, \(\beta_g(m \times 1) \sim N(0,\lambda^{-1}_\beta I_m)\) is the vector of biological parameters, with unique value for each of the \(m\) replicates, \(\mathbf W (N \times k)\) is the k-dimensional unknown unwanted factors and \(\alpha_g (k \times 1) \sim N(\alpha_\mu,\lambda^{-1}_\alpha I_k)\) is the vector of regression coefficient associated with the unwanted factors and finally \(\zeta_g\) is the location parameter for gene \(g\) after adjusting for the unwanted factors.

Unwanted variation due to factors such as library size and batch effect are captured by the \(\mathbf W\) matrix. For example, when \(K=1\) and the \(\mathbf W\) column is approximately equal (up to a multiplicative constant) to log library size (LS) then \(\mu_g \propto (LS)^{\alpha_g}\), thus allowing a possibly non-linear, gene-specific relationship between library size and raw count.

To estimate the unknown unwanted factors and the regression coefficients, we use iterative reiweighted least squares (IRLS) algorithm that takes advantage of negative control genes (genes where \(\beta_g \approx 0\)) and sets of replicates (i.e., a collection of cells where the gene expression variation across the cells is solely due to the unwanted factors).

1.2 RUV-III-NB adjusted data

Once the unwanted factors are estimated, their effects are removed from the data, and subsequent downstream analyses such as clustering, trajectory and differential expression analyses are performed on the adjusted data. Here we demonstrate the use of RUV-III-NB adjusted data, that is the log percentile-invariant adjusted count (PAC), for downstream analyses such as clustering and differential analysis.

2 Example Datasets

2.0.1 Non-Small Cell Lung Cancer Cells (NSCLC) Dataset

The Non-Small Cell Lung Cancer Cells (NSCLC) dataset is freely available from the 10x Genomics website (www.10xgenomics.com). In this study, the cells have been sequenced in a single batch, however the sizes of the cells vary considerably. For example, epithelial cells are larger in size and immune cells such as T cells are smaller in size. Here we will see that the library size is correlated with cell size, and this needs to be taken into account in normalising the data. The input data format and the preliminary analyses will be demonstrated using this dataset.

2.0.2 Cell line Dataset

In the cell line dataset, cells have been sequenced in three separate batches using the 10x technology. The cell-types are known in this study, however, this known cell type factor is correlated with the batch variable. For example, one batch only contains the Jurkat cell line, another batch contains only the 293T cell line and the third batch contains a 50-50 mixture of both cell lines. In removing unwanted variation, this confounding needs to be taken into account.

3 Installing the required packages

The RUV-III-NB package can be installed from github as shown below.

#Install ruvIIInb package
devtools::install_github("limfuxing/ruvIIInb")
library(ruvIIInb)

Several other packages are required for pre-processing, visualising and the downstream analyses, and these are also shown in the following code.


library(SingleCellExperiment)
library(scater)
library(scran)
library(scuttle)
library(edgeR)
library(SingleR)
library(celldex)
library(hrbrthemes)
library(tidyverse)
library(ggplot2)
library(uwot)
library(scMerge)
library(Seurat)
library(randomcoloR)
library(dittoSeq)
library(pheatmap)
library(gridExtra)
library(igraph)

4 Input data format

For the input data format, we recommend first creating a SingleCellExperiment object that contains the raw data.

nsclc_obs<-readRDS(system.file("extdata", "nsclc_obs.rds", package="ruvIIInb"))
nsclc_samples<-readRDS(system.file("extdata", "nsclc_samples.rds", package="ruvIIInb"))

The NSCLC dataset consists of the following: (i) A count data matrix nsclc_obs with 33660 genes in rows and 7802 cells in columns, and (ii) A data frame that describes the samples which consists of the total number of counts for the cell (lib.size) and barcode (Barcode) for each cell in this study.

sce <- SingleCellExperiment(assays=list(counts=nsclc_obs),
                            colData=as.data.frame(nsclc_samples))

5 Preprocessing steps

Several quality control (QC) steps then need to be carried out on the data, and these can be computed using an external R package such as scuttle.

5.1 Computing cell-level quality control metrics

The cell-level quality control metrics such as the number of genes that have non-zero counts and the percentage of counts that comes from Mitochondrial genes for each cell can be computed and added to the SingleCellExperiment object as follows:

sce <- addPerCellQCMetrics(x = sce,subsets=list(Mito=grep("MT-",rownames(sce))))

The metrics calculated above may then be used to identify potential outliers. We recommend visualising the distribution of cells which contain a high proportion of cell-level counts for Mitochondrial genes as well as those which have low cell-level total counts, that are a certain number of median absolute deviations (MADs) away from the median value.

libsize_drop <- isOutlier(
  metric = sce$total, 
  nmads = 2,
  type = "lower", 
  log = TRUE)
colData(sce)$libsize_drop<-libsize_drop


mito_drop <- isOutlier(
  metric = colData(sce)$subsets_Mito_percent, 
  nmads = 3, 
  type = "higher")
colData(sce)$mito_drop<-mito_drop

Figures 5.1 and 5.2 visualise the distribution of cells that are flagged this way.

A histogram showing the distribution of log cell-wise total counts flagging those with low library size

Figure 5.1: A histogram showing the distribution of log cell-wise total counts flagging those with low library size

A scatterplot showing cell-level log number of detected genes against log total count

Figure 5.2: A scatterplot showing cell-level log number of detected genes against log total count

5.2 Computing gene-level quality control metrics

Similarly, we recommend computing gene-level quality control metrics such as the mean count across all cells for each gene, and the percentage of cells with non-zero counts for each gene. The code below adds these measures to the SingleCellExperiment object and flags genes with a low mean cell count.

sce <- addPerFeatureQCMetrics(x = sce)

#Remove genes with zero counts for each gene
sce<-sce[-which(rowData(sce)$mean==0),]

lowcount_drop <- isOutlier(
  metric = rowData(sce)$mean,
  nmads = 1.5, 
  type = "lower", 
  log = TRUE)
A histogram showing the distribution of
 gene-wise log mean counts

Figure 5.3: A histogram showing the distribution of gene-wise log mean counts

Once the computed quality control metrics have been explored, the code below retains only the selected genes and cells.

sce <- sce[!(lowcount_drop), !(libsize_drop | mito_drop)]

Following quality control, the count data matrix consists of 20352 genes in rows and 6382 cells in columns.

5.3 Optionally using only Highly Variable Genes(HVG)

If the computational resources are limited, the analyses may be continued using only highly variable genes (HVG). HVG are defined as genes with large variance compared to genes with the same mean expression levels, accounting for the fact that the variation in scRNA-seq data is related to the mean expression. The code below demonstrates the identification of HVG. In contrast to the analyses presented in the paper, , we retain only the HVG as shown in Figure 5.4.

sce <- computeSumFactors(sce,assay.type="counts")
data_norm_pre <- sweep(assays(sce)$counts,2,sce$sizeFactor,'/')
assays(sce, withDimnames=FALSE)$lognormcounts<- log(data_norm_pre+1)
hvg_df <- modelGeneVar(sce, assay.type = "lognormcounts")
hvg_df<- hvg_df[rownames(sce),]
hvg_genes   <- hvg_df$bio>quantile(hvg_df$bio,prob=0.75)
rowData(sce)$hvg_genes<-hvg_genes
sce<-sce[hvg_genes,]
Log normalized expression: mean vs variance

Figure 5.4: Log normalized expression: mean vs variance

6 Examples

6.1 Example I: NSCLC data

Following the preliminary steps described above, the pre-processed NSCLC dataset consists of 5088 genes in rows and 6382 cells in columns. We use this data matrix to demonstrate the application of RUV-III-NB to this dataset.

6.1.1 Normalising the data using RUV-III-NB

6.1.1.1 Identification of negative control genes

Instead of assuming that the components of unwanted variation in the data are observed, these are estimated from the data using a set of negative control genes. The variation in these negative control genes mostly consists of the underlying unwanted variation. In this application, we use the single-cell housekeeping genes described by Lin et al., 2019 as negative control genes.

# Reading in the control genes
data(Hs.schk)
ctl <- as.character(Hs.schk)
#compare overall log Library sizes vs log library sizes using control genes only
sce$logLS <- log(colSums(assays(sce)$counts))
rowData(sce)$ctlLogical<-rownames(assays(sce)$counts) %in% ctl
sce$logLS_ctl <- log(colSums(assays(sce)$counts[rowData(sce)$ctlLogical,]))

plot(sce$logLS,sce$logLS_ctl,col="grey",
     xlab="Log library size using all genes",
     ylab="Log library size using the control genes")
Comparing the overall log library size vs the log library size using the control genes

Figure 6.1: Comparing the overall log library size vs the log library size using the control genes

6.1.1.2 Identification of pseudo-replicates

As the NSCLC dataset only consists of a single batch, initial clustering is used to identify a set of homogeneous cells following scran normalisation. Cells which belong to the same initial cluster are then considered as pseudo-replicates. The code for doing this is shown below.


# Perform initial clustering to identify pseudo-replicates
snn_gr_init <- buildSNNGraph(sce, assay.type = "lognormcounts")
clusters_init <- igraph::cluster_louvain(snn_gr_init)
sce$cluster_init <- factor(clusters_init$membership)

# Identify cell-type for each cluster
ref_se <- HumanPrimaryCellAtlasData()

# Annotate initial cluster
pred10xnsclc_init  <- SingleR(test = sce, ref = ref_se, 
                              labels = ref_se$label.fine,
                              assay.type.test='lognormcounts',
                              clusters=factor(sce$cluster_init))
sce$ctype_init <- pred10xnsclc_init$labels[sce$cluster_init]

# Anotate cells
pred10xnsclc_init2  <- SingleR(test = sce, ref = ref_se, 
                              labels=ref_se$label.fine,
                              assay.type.test='lognormcounts')
sce$ctype_init2 <- pred10xnsclc_init2$labels


sce <- runUMAP(sce,exprs_values = "lognormcounts")
sce <- runPCA(sce,exprs_values = "lognormcounts")

Figure 6.2 visualises the initial clusters for identifying pseudo-replicates. The UMAP plots show that the cell type is correlated with the library size.

Visualising the initial clusters for identifying pseudo-replicates

Figure 6.2: Visualising the initial clusters for identifying pseudo-replicates

RUV-III-NB can be run with or without the use of pseudo-cells as we demonstrate here.

6.1.1.3 Running RUV-III-NB without pseudo-cells

The code shown below performs RUV-III-NB normalisation without the use of pseudo-cells.

# Construct the replicate matrix M using pseudo-replicates identified using initial clustering
M <- matrix(0,ncol(assays(sce)$counts),length(unique(sce$cluster_init)))
cl<- sort(unique(as.numeric(unique(sce$cluster_init))))
for(CL in cl) 
  M[which(as.numeric(sce$cluster_init)==CL),CL] <- 1

#RUV-III-NB code
ruv3nb_out<-ruvIII.nb(Y=as.matrix(assays(sce)$counts), # count matrix with genes as rows and cells as columns
                    M=M, #Replicate matrix constructed as above
                    ctl=rowData(sce)$ctlLogical, #A vector denoting control genes
                    k=2, # dimension of unwanted variation factors
                    strata=sce$cluster_init)

6.1.1.4 Running RUV-III-NB with pseudo-cells

In the above, the pseudo-replicates were identified through the use of initial clustering. As an optional step, we can strengthen the pseudo-replicates identified from this initial clustering using the pool and divide strategy. Note that we need to specify strata, so that the cells which belong to different strata (in this case, initial clusters we have identified) are not pooled together.


ruv3nb_out_ps<-ruvIII.nb(Y=as.matrix(assays(sce)$counts), 
                    M=diag(dim(sce)[2]),
                    ctl=rowData(sce)$ctlLogical,
                    k=2, 
                    use.pseudosample=TRUE, #Indicating whether pseudo-cells are to be used to to define pseudo-replicates
                    batch=rep(1,dim(sce)[2]),# NSCLC data comes from a single batch 
                    strata=sce$cluster_init #The cells which belong to initial clusters are not pooled together
                    )

6.1.2 Downstream analyses and assessing the normalisation

After carrying out the RUV-III-NB normalisation using the ruvIII.nb function, we can explore the estimated unwanted variation component which can be obtained using ruv3nb_out$W. In this example, we can see that the first component of unwanted variation is highly correlated with the log library size.

Scatterplots showing the log library size vs components of unwanted variation

Figure 6.3: Scatterplots showing the log library size vs components of unwanted variation

For downstream analyses, we recommend creating a SingleCellExperiment object. This SingleCellExperiment object contains the normalised data in the and components of the slot.

#Creating a SingleCellExperiment object
sce_ruv3nb <- makeSCE(ruv3nb_out,cData=colData(sce))

For seamless integration with the Seurat R package, the SingleCellExperiment object from above can be converted to a Seurat object using the code below and the functions in the Seurat package can then be used.

seurat_ruv3nb <- as.Seurat(sce_ruv3nb, counts = "counts", data ="logcorrected")

Once created, the SingleCellExperiment (SCE) object can be used for downstream analyses such as dimensional reduction, clustering and differential expression analysis.

6.1.2.1 Clustering

We now demonstrate performing principal components analysis on cells based on the Log of percentile-invariant adjusted count (PAC).

# PCA of RUV-III-NB log PAC data 
sce_ruv3nb <- runPCA(sce_ruv3nb, exprs_values = "logcorrected")
Visualising PCA following RUV-III-NB normalisation

Figure 6.4: Visualising PCA following RUV-III-NB normalisation

6.1.2.2 Annotation

The codes for performing cell-type identification, annotating clusters and cells, and visualising are shown below:


# Perform clustering
snn_gr <- buildSNNGraph(sce_ruv3nb, assay.type = "logcorrected")
clusters <- igraph::cluster_louvain(snn_gr)
sce_ruv3nb$cluster_ruv <- factor(clusters$membership)

# Identify cell-type for each cluster
ref_se <- HumanPrimaryCellAtlasData()

# annotate cluster
pred10xnsclc_ruv  <- SingleR(test = sce_ruv3nb, ref = ref_se, 
                             labels = ref_se$label.fine,
                             assay.type.test='logcorrected',
                             clusters=factor(sce_ruv3nb$cluster_ruv))
sce_ruv3nb$ctype_ruv <- pred10xnsclc_ruv$labels[sce_ruv3nb$cluster_ruv]

# anotate cells
pred10xnsclc_ruv2  <- SingleR(test = sce_ruv3nb, ref = ref_se, 
                              labels=ref_se$label.fine,
                              assay.type.test='logcorrected')
sce_ruv3nb$ctype_ruv2 <- pred10xnsclc_ruv2$labels

#UMAP
sce_ruv3nb <- runUMAP(sce_ruv3nb, exprs_values = "logcorrected")
Visualising the clusters following RUV-III-NB normalisation

Figure 6.5: Visualising the clusters following RUV-III-NB normalisation

6.1.2.3 Identification of differentially expressed genes

Differentially expressed (DE) genes can be identified using an external package such as scran. The findMarkers function in scran identifies a combination of genes that defines one cluster against the rest by performing pairwise comparisons. The code below demonstrates the identification of top DE genes for a specified cluster using RUV-III-NB adjusted data and visualising the log fold-changes in Figure 6.6.

markers_ruv3nb <- findMarkers(x=as.matrix(assays(sce_ruv3nb)$logcorrected),
                              groups=sce_ruv3nb$ctype_ruv)#identify a combination of marker genes that together defines one cluster against the rest. 
cluster3_markers <- markers_ruv3nb[[3]] #Cluster 3 ("CMP" cell-type) is used here for demonstration purposes.
cluster3_top5 <- cluster3_markers[cluster3_markers $Top <= 5,] #Collects the top DE genes from each pairwise comparison involving cluster 3 
logFCs_cluster3_top5 <- getMarkerEffects(cluster3_top5) #Extracts log fold changes
A heatmap of the DE log fold changes

Figure 6.6: A heatmap of the DE log fold changes

6.2 Example II: Cell line data, using known cell types to define pseudo-replicates

sce_celline<-readRDS(system.file("extdata", "celline_preprocessed.rds", package="ruvIIInb"))

#new.labels is misspelt
names_celline<-names(colData(sce_celline))
names(colData(sce_celline))[which(names_celline=="new.lables")]<-"new.labels"
names(colData(sce_celline))
#>  [1] "original.lables"       "batch"                 "batch.color"          
#>  [4] "new.labels"            "cell.color"            "cell.barcode"         
#>  [7] "sum"                   "detected"              "subsets_Mito_sum"     
#> [10] "subsets_Mito_detected" "subsets_Mito_percent"  "total"                
#> [13] "libsize_drop"          "mito_drop"

celline_obs<-as.matrix(assays(sce_celline)$counts)
celline_samples<-colData(sce_celline)
celline_rows<-rowData(sce_celline)

The cell line dataset consists of the following: (i) A count data matrix celline_obs which consists of 19754 genes in rows and 8796 cells in columns, and (ii) A data frame that describes the cells consisting of details such as the batch (batch) and the cell type (new.labels). The output following quality control is used above to demonstrate the concepts.

sce_celline <- SingleCellExperiment(assays=list(counts=celline_obs),
                            colData=as.data.frame(celline_samples),
                            rowData=as.data.frame(celline_rows))

6.2.1 Normalising the data using RUV-III-NB

As the cell types are known for this example, the replicate matrix M is constructed using the known cell-types.


# Construct the replicate matrix M using the known cell-types
unique_ctype<- unique(sce_celline$new.labels)
M <- matrix(0,ncol(assays(sce_celline)$counts),length(unique_ctype))
dim(M)
for(i in 1:length(unique_ctype))
 M[sce_celline$new.labels==unique_ctype[i],i] <- 1

#Read-in the control genes
data(Hs.schk)
ctl <- as.character(Hs.schk)
ctl <- rownames(assays(sce_celline)$counts) %in% ctl

#RUV-III-NB code
ruv3nb_out_celline<-ruvIII.nb(Y=as.matrix(assays(sce_celline)$counts),
                           M=M,
                           ctl=ctl,
                           k=2,
                           ncores = 6)

#Creating a SingleCellExperiment object
sce_ruv3nb_celline <- makeSCE(ruv3nb_out_celline,cData=colData(sce_celline))

#Converting to a Seurat object if necessary
seurat_ruv3nb_celline <- as.Seurat(sce_ruv3nb_celline, counts = "counts", data ="logcorrected")

6.2.2 Downstream analyses and assessing the normalisation

We can now explore the unwanted variation component. Here we see that the unwanted variation components capture the unwanted library size and batch variations.

Scatterplots showing batch, log library size vs the components of unwanted variation

Figure 6.7: Scatterplots showing batch, log library size vs the components of unwanted variation

6.2.2.1 Clustering

We now demonstrate performing clustering on the RUV-III-NB normalised data.

# PCA and UMAP of RUV-III-NB log PAC data 
sce_ruv3nb_celline <- runPCA(sce_ruv3nb_celline, exprs_values = "logcorrected")
sce_ruv3nb_celline <- runUMAP(sce_ruv3nb_celline, exprs_values = "logcorrected")
# PCA and UMAP of Scran normalized data 
sce_celline <- computeSumFactors(sce_celline,assay.type="counts")
data_norm_celline_pre <- sweep(assays(sce_celline)$counts,2,sce_celline$sizeFactor,'/')
assays(sce_celline, withDimnames=FALSE)$lognormcounts<- log(data_norm_celline_pre+1)
sce_celline <- runPCA(sce_celline,exprs_values = "lognormcounts")
sce_celline <- runUMAP(sce_celline,exprs_values = "lognormcounts")

`

Visualising PCA following RUV-III-NB normalisation of the cell line data

Figure 6.8: Visualising PCA following RUV-III-NB normalisation of the cell line data

Visualising the clusters following RUV-III-NB normalisation of the cell line data

Figure 6.9: Visualising the clusters following RUV-III-NB normalisation of the cell line data

6.2.2.2 Identification of differentially expressed genes

Using RUV-III-NB log adjusted data, the code below demonstrates the identification of top DE genes between 293t and Jurkat cell types.

markers_ruv3nb_celline <- findMarkers(x=as.matrix(assays(sce_ruv3nb_celline)$logcorrected),
                              groups=sce_ruv3nb_celline$new.labels)
markers_celline293t <- markers_ruv3nb_celline[["293t"]]
top5_celline293t <- markers_celline293t[markers_celline293t$Top <= 5,]
top5_celline293t
#> DataFrame with 5 rows and 5 columns
#>               Top   p.value       FDR summary.logFC logFC.jurkat
#>         <integer> <numeric> <numeric>     <numeric>    <numeric>
#> CD3D            1         0         0      -2.94228     -2.94228
#> CKB             2         0         0       3.04443      3.04443
#> ARHGDIB         3         0         0      -2.51722     -2.51722
#> TMSB4X          4         0         0      -3.27063     -3.27063
#> ADA             5         0         0      -2.16635     -2.16635

6.3 Example III: Cell line data, using scReplicate to define pseudo-replicates

6.3.1 Normalising the data using RUV-III-NB

In the absence of known cell types, RUV-III-NB could be used by defining the replicate matrix M using the scReplicate algorithm described by in the scMerge package. In this Section, we demonstrate how this can be implemented using cell line dataset treating cell-type as unknown.

# Construct the replicate matrix M using the `scReplicate` algorithm
sce_celline <- computeSumFactors(sce_celline,assay.type="counts")
data_norm_pre <- sweep(assays(sce_celline)$counts,2,sce_celline$sizeFactor,'/')
assays(sce_celline, withDimnames=FALSE)$lognormcounts<- log(data_norm_pre+1)
scm.M<-scReplicate(sce_combine=sce_celline,
                   exprs = "lognormcounts",
                   batch=as.numeric(factor(sce_celline$batch)),
                   kmeansK=c(1,1,2))
mode(scm.M) <- 'logical'


ruv3nb_out_celline_scMerge<- ruvIIInb::ruvIII.nb(as.matrix(assays(sce_celline)$counts),
                            M=scm.M,
                            ctl=ctl,
                            k=2, 
                            ncores = 6)

6.3.2 Downstream analyses and assessing the normalisation

#Creating a SingleCellExperiment object
sce_ruv3nb_celline_scMerge <- makeSCE(ruv3nb_out_celline_scMerge,
                                         cData=colData(sce_celline))

#Converting to a Seurat object if necessary
seurat_ruv3nb_celline_scMerge <- as.Seurat(sce_ruv3nb_celline_scMerge, counts = "counts", data ="logcorrected")
#> Warning: Feature names cannot have underscores ('_'), replacing with dashes
#> ('-')

#> Warning: Feature names cannot have underscores ('_'), replacing with dashes
#> ('-')

6.3.2.1 Clustering


# PCA and UMAP of RUV-III-NB log PAC data 
sce_ruv3nb_celline_scMerge <- runPCA(sce_ruv3nb_celline_scMerge, 
                                     exprs_values = "logcorrected")
sce_ruv3nb_celline_scMerge <- runUMAP(sce_ruv3nb_celline_scMerge, 
                                      exprs_values = "logcorrected")
Visualising the clusters following RUV-III-NB normalisation of the cell line data

Figure 6.10: Visualising the clusters following RUV-III-NB normalisation of the cell line data

6.3.2.2 Identification of differentially expressed genes

markers_ruv3nb_celline_scMerge <- findMarkers(x=as.matrix(assays(sce_ruv3nb_celline_scMerge)$logcorrected),
                                      groups=sce_ruv3nb_celline_scMerge$new.labels)
markers_celline293t_scMerge <- markers_ruv3nb_celline_scMerge[["293t"]]
top5_celline293t_scMerge <- markers_celline293t_scMerge[markers_celline293t_scMerge$Top <= 5,]
top5_celline293t_scMerge
#> DataFrame with 5 rows and 5 columns
#>               Top   p.value       FDR summary.logFC logFC.jurkat
#>         <integer> <numeric> <numeric>     <numeric>    <numeric>
#> CD3D            1         0         0      -2.92735     -2.92735
#> ARHGDIB         2         0         0      -2.50257     -2.50257
#> CKB             3         0         0       2.98830      2.98830
#> TMSB4X          4         0         0      -3.24825     -3.24825
#> CDKN2A          5         0         0       2.31375      2.31375

7 Session Information

sessionInfo()
#> R version 4.1.1 (2021-08-10)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: CentOS Linux 7 (Core)
#> 
#> Matrix products: default
#> BLAS:   /stornext/System/data/apps/R/R-4.1.1/lib64/R/lib/libRblas.so
#> LAPACK: /stornext/System/data/apps/R/R-4.1.1/lib64/R/lib/libRlapack.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
#> [8] methods   base     
#> 
#> other attached packages:
#>  [1] igraph_1.2.6                gridExtra_2.3              
#>  [3] pheatmap_1.0.12             dittoSeq_1.4.4             
#>  [5] randomcoloR_1.1.0.1         SeuratObject_4.0.2         
#>  [7] Seurat_4.0.4                scMerge_1.8.0              
#>  [9] uwot_0.1.10                 Matrix_1.3-4               
#> [11] forcats_0.5.1               stringr_1.4.0              
#> [13] dplyr_1.0.7                 purrr_0.3.4                
#> [15] readr_2.0.1                 tidyr_1.1.3                
#> [17] tibble_3.1.4                tidyverse_1.3.1            
#> [19] hrbrthemes_0.8.0            celldex_1.2.0              
#> [21] SingleR_1.6.1               edgeR_3.34.0               
#> [23] limma_3.48.3                scran_1.20.1               
#> [25] scater_1.20.1               ggplot2_3.3.5              
#> [27] scuttle_1.2.1               SingleCellExperiment_1.14.1
#> [29] SummarizedExperiment_1.22.0 Biobase_2.52.0             
#> [31] GenomicRanges_1.44.0        GenomeInfoDb_1.28.2        
#> [33] IRanges_2.26.0              S4Vectors_0.30.0           
#> [35] BiocGenerics_0.38.0         MatrixGenerics_1.4.2       
#> [37] matrixStats_0.60.1          ruvIIInb_0.7.6.3           
#> 
#> loaded via a namespace (and not attached):
#>   [1] rappdirs_0.3.3                scattermore_0.7              
#>   [3] ZIM_1.1.0                     bit64_4.0.5                  
#>   [5] knitr_1.33                    irlba_2.3.3                  
#>   [7] DelayedArray_0.18.0           data.table_1.14.0            
#>   [9] rpart_4.1-15                  KEGGREST_1.32.0              
#>  [11] RCurl_1.98-1.4                generics_0.1.0               
#>  [13] ScaledMatrix_1.0.0            cowplot_1.1.1                
#>  [15] RSQLite_2.2.8                 RANN_2.6.1                   
#>  [17] proxy_0.4-26                  future_1.22.1                
#>  [19] bit_4.0.4                     tzdb_0.1.2                   
#>  [21] spatstat.data_2.1-0           xml2_1.3.2                   
#>  [23] lubridate_1.7.10              httpuv_1.6.2                 
#>  [25] assertthat_0.2.1              viridis_0.6.1                
#>  [27] xfun_0.25                     hms_1.1.0                    
#>  [29] jquerylib_0.1.4               evaluate_0.14                
#>  [31] promises_1.2.0.1              fansi_0.5.0                  
#>  [33] caTools_1.18.2                dbplyr_2.1.1                 
#>  [35] readxl_1.3.1                  DBI_1.1.1                    
#>  [37] htmlwidgets_1.5.3             spatstat.geom_2.2-2          
#>  [39] ellipsis_0.3.2                RSpectra_0.16-0              
#>  [41] backports_1.2.1               V8_3.4.2                     
#>  [43] bookdown_0.24.2               deldir_0.2-10                
#>  [45] sparseMatrixStats_1.4.2       vctrs_0.3.8                  
#>  [47] ROCR_1.0-11                   abind_1.4-5                  
#>  [49] cachem_1.0.6                  withr_2.4.2                  
#>  [51] sfsmisc_1.1-11                bdsmatrix_1.3-4              
#>  [53] checkmate_2.0.0               sctransform_0.3.2            
#>  [55] goftest_1.2-2                 cluster_2.1.2                
#>  [57] ExperimentHub_2.0.0           lazyeval_0.2.2               
#>  [59] crayon_1.4.1                  labeling_0.4.2               
#>  [61] pkgconfig_2.0.3               nlme_3.1-152                 
#>  [63] vipor_0.4.5                   nnet_7.3-16                  
#>  [65] rlang_0.4.11                  globals_0.14.0               
#>  [67] miniUI_0.1.1.1                lifecycle_1.0.0              
#>  [69] startupmsg_0.9.6              filelock_1.0.2               
#>  [71] extrafontdb_1.0               BiocFileCache_2.0.0          
#>  [73] modelr_0.1.8                  rsvd_1.0.5                   
#>  [75] AnnotationHub_3.0.1           polyclip_1.10-0              
#>  [77] cellranger_1.1.0              lmtest_0.9-38                
#>  [79] zoo_1.8-9                     reprex_2.0.1                 
#>  [81] base64enc_0.1-3               beeswarm_0.4.0               
#>  [83] ggridges_0.5.3                png_0.1-7                    
#>  [85] viridisLite_0.4.0             bitops_1.0-7                 
#>  [87] KernSmooth_2.23-20            Biostrings_2.60.2            
#>  [89] blob_1.2.2                    DelayedMatrixStats_1.14.2    
#>  [91] parallelly_1.28.1             jpeg_0.1-9                   
#>  [93] beachmat_2.8.1                scales_1.1.1                 
#>  [95] memoise_2.0.0                 magrittr_2.0.1               
#>  [97] plyr_1.8.6                    ica_1.0-2                    
#>  [99] gplots_3.1.1                  zlibbioc_1.38.0              
#> [101] compiler_4.1.1                dqrng_0.3.0                  
#> [103] bbmle_1.0.24                  RColorBrewer_1.1-2           
#> [105] fitdistrplus_1.1-5            cli_3.0.1                    
#> [107] XVector_0.32.0                listenv_0.8.0                
#> [109] pbapply_1.5-0                 patchwork_1.1.1              
#> [111] htmlTable_2.2.1               Formula_1.2-4                
#> [113] MASS_7.3-54                   mgcv_1.8-36                  
#> [115] tidyselect_1.1.1              stringi_1.7.4                
#> [117] highr_0.9                     yaml_2.2.1                   
#> [119] BiocSingular_1.8.1            locfit_1.5-9.4               
#> [121] latticeExtra_0.6-29           ggrepel_0.9.1                
#> [123] grid_4.1.1                    sass_0.4.0                   
#> [125] tools_4.1.1                   future.apply_1.8.1           
#> [127] ruv_0.9.7.1                   rstudioapi_0.13              
#> [129] bluster_1.2.1                 foreign_0.8-81               
#> [131] metapod_1.0.0                 farver_2.1.0                 
#> [133] Rtsne_0.15                    digest_0.6.27                
#> [135] distr_2.8.0                   BiocManager_1.30.16          
#> [137] shiny_1.6.0                   Rcpp_1.0.7                   
#> [139] broom_0.7.9                   BiocVersion_3.13.1           
#> [141] later_1.3.0                   RcppAnnoy_0.0.19             
#> [143] httr_1.4.2                    gdtools_0.2.3                
#> [145] AnnotationDbi_1.54.1          colorspace_2.0-2             
#> [147] tensor_1.5                    rvest_1.0.1                  
#> [149] fs_1.5.0                      reticulate_1.22              
#> [151] reldist_1.6-6                 splines_4.1.1                
#> [153] statmod_1.4.36                spatstat.utils_2.2-0         
#> [155] plotly_4.9.4.1                systemfonts_1.0.2            
#> [157] xtable_1.8-4                  jsonlite_1.7.2               
#> [159] M3Drop_1.18.0                 R6_2.5.1                     
#> [161] Hmisc_4.5-0                   pillar_1.6.2                 
#> [163] htmltools_0.5.2               mime_0.11                    
#> [165] glue_1.4.2                    fastmap_1.1.0                
#> [167] BiocParallel_1.26.2           BiocNeighbors_1.10.0         
#> [169] interactiveDisplayBase_1.30.0 codetools_0.2-18             
#> [171] mvtnorm_1.1-2                 utf8_1.2.2                   
#> [173] spatstat.sparse_2.0-0         lattice_0.20-44              
#> [175] bslib_0.2.5.1                 numDeriv_2016.8-1.1          
#> [177] curl_4.3.2                    ggbeeswarm_0.6.0             
#> [179] leiden_0.3.9                  gtools_3.9.2                 
#> [181] Rttf2pt1_1.3.9                survival_3.2-11              
#> [183] rmarkdown_2.10                munsell_0.5.0                
#> [185] GenomeInfoDbData_1.2.6        reshape2_1.4.4               
#> [187] haven_2.4.3                   gtable_0.3.0                 
#> [189] spatstat.core_2.3-0           extrafont_0.17